The First Sequences to Be Collected Were Those of Proteins

ثبت نشده
چکیده

Syntax Notation Sequence Format Abstract Syntax Notation (ASN.1) is a formal data description language that has beendeveloped by the computer industry. ASN.1 (http://www-sop.inria.fr/rodeo/personnel/hoschka/asn1.html; NCBI 1993) has been adopted by the National Center for Biotechnol-ogy Information (NCBI) to encode data such as sequences, maps, taxonomic information,molecular structures, and bibliographic information. These data sets may then be easilyconnected and accessed by computers. The ASN.1 sequence format is a highly structuredand detailed format especially designed for computer access to the data. All the informa-tion found in other forms of sequence storage, e.g., the GenBank format, is present. Forexample, sequences can be retrieved in this format by ENTREZ (see below). However, theinformation is much more difficult to read by eye than a GenBank formatted sequence.One would normally not need to use the ASN.1 format except when running a computerprogram that uses this format as input.Syntax Notation (ASN.1) is a formal data description language that has beendeveloped by the computer industry. ASN.1 (http://www-sop.inria.fr/rodeo/personnel/hoschka/asn1.html; NCBI 1993) has been adopted by the National Center for Biotechnol-ogy Information (NCBI) to encode data such as sequences, maps, taxonomic information,molecular structures, and bibliographic information. These data sets may then be easilyconnected and accessed by computers. The ASN.1 sequence format is a highly structuredand detailed format especially designed for computer access to the data. All the informa-tion found in other forms of sequence storage, e.g., the GenBank format, is present. Forexample, sequences can be retrieved in this format by ENTREZ (see below). However, theinformation is much more difficult to read by eye than a GenBank formatted sequence.One would normally not need to use the ASN.1 format except when running a computerprogram that uses this format as input. Genetic Data Environment Sequence Format Genetic Data Environment (GDE) format is used by a sequence analysis system called theGenetic Data Environment, which was designed by Steven Smith and collaborators (Smithet al. 1994) around a multiple sequence alignment editor that runs on UNIX machines.The GDE features are incorporated into the SEQLAB interface of the GCG software, ver-sion 9. GDE format is a tagged-field format similar to ASN.1 that is used for storing allavailable information about a sequence, including residue color. The file consists of vari-ous fields (Fig. 2.13), each enclosed by brackets, and each field has specific lines, each witha given name tag. The information following each tag is placed in double quotes or followsthe tag name by one or more spaces. Figure 2.13. The Genetic Data Environment format. 36 ■ C H A P T E R 2 CONVERSIONS OF ONE SEQUENCE FORMAT TO ANOTHER READSEQ to Switch between Sequence Formats READSEQ is an extremely useful sequence formatting program developed by D. G. Gilbertat Indiana University, Bloomington (gilbertd bio.indiana.edu). READSEQ can recognizea DNA or protein sequence file in any of the formats shown in Table 2.3, identify the for-mat, and write a new file with an alternative format. Some of these formats are used forspecial types of analyses such as multiple sequence alignment and phylogenetic analysis.The appearance of these formats for two sample DNA sequences, seq1 and seq2, is shownin Table 2.4. READSEQ may be reached at the Baylor College of Medicine site athttp://dot.imgen.bcm.tmc.edu:9331/seq-util/readseq.html and also by anonymous FTPfrom ftp.bio.indiana.edu/molbio/readseq or ftp.bioindiana.edu/molbio/mac to obtain theappropriate files.Data files that have multiple sequences, such as those required for multiple sequencealignment and phylogenetic analysis using parsimony (PAUP), are also converted. Exam-ples of the types of files produced are shown in Table 2.4. Options to reverse-complementand to remove gaps from sequences are included. SEQIO, another sequence conversionprogram for a UNIX machine, is described at http://bioweb.pasteur.fr/docs/seqio/seqio.html and is available for download at http://www.cs.ucdavis.edu/ gusfield/seqio.html. Table 2.3. Sequence formats recognized by format conversionprogram READSEQ 1. Abstract Syntax Notation (ASN.1)2. DNA Strider3. European Molecular Biology Laboratory (EMBL)4. Fasta/Pearson5. Fitch (for phylogenetic analysis)6. GenBank7. Genetics Computer Group (GCG)8. Intelligenetics/Stanford9. Multiple sequence format (MSF)10. National Biomedical Research Foundation (NBRF)11. Olsen (in only)12. Phylogenetic Analysis Using Parsimony (PAUP) NEXUS format13. Phylogenetic Inference package (Phylip v3.3, v3.4)14. Phylogenetic Inference package (Phylip v3.2)15. Plain text/Staden16. Pretty format for publication (output only)17. Protein Information Resource (PIR or CODATA) 18. Zuker for RNA analysis (in only)a For conversion of single sequence files only. The other conversions canbe performed on files with single or multiple sequences. C O L L E C T I N G A N D S T O R I N G S E Q U E N C E S I N T H E L A B O R A T O R Y ■ 37 Table 2.4. Multiple sequence format conversions by READSEQ 1. Fasta/Pearson format >seq1agctagct agct agct>seq2aactaact aact aact 2. Intelligenetics format ;seq1, 16 bases, 2688 checksum.seq1agctagctagctagct1;seq2, 16 bases, 25C8 checksum.seq2aactaactaactaact1

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

In Silico Characterization of Proteins Containing ARID-PHD Domain and Its Expression in Aeluropus littoralis Halophyte

Abiotic stresses are the most important factors that reduce the yield of crops. In this case, Bioinformatics analysis plays an important role to study genes, and their relatedness as well as prediction their function in response to abiotic stresses. Among all domains, ARID-PHD domain has been identified in plants and animals and has a very significant role in growth regulation, cell cycle, and ...

متن کامل

Signal processing approaches as novel tools for the clustering of N-acetyl-β-D-glucosaminidases

Nowadays, the clustering of proteins and enzymes in particular, are one of the most popular topics in bioinformatics. Increasing number of chitinase genes from different organisms and their sequences have beenidentified. So far, various mathematical algorithms for the clustering of chitinase genes have been used butmost of them seem to be confusing and sometimes insufficient. In the...

متن کامل

The roles of EPIYA sequence to perturb the cellular signaling pathways and cancer risk

Abstract It was shown that several pathogenic bacterial effector proteins contain the Glu-Pro-Ile-Tyr-Ala (EPIYA) or a similar sequence. These bacterial EPIYA effectors are delivered into host cell via type III or IV secretion system, where they undergo tyrosine phosphorylation at the EPIYA sequences, which triggers interaction with multiple host cell SH2 domain-containing proteins and thereby...

متن کامل

بیان پروتئین نوترکیب IpaD-STxB و بررسی ایمنی زایی آن در موش سوری

Background and purpose: The most common cause of diarrhea is Shigella and no vaccine has been found so far. IpaD and STxB proteins (B subunit of Shiga toxin) play an important role in invasion, infection and pathogenesis caused by Shigella. To evaluate the immunogenicity of each of the proteins IpaD and STxB can using of two animal models mice and guinea pigs and could be determined role of eac...

متن کامل

Designing Of Degenerate Primers-Based Polymerase Chain Reaction (PCR) For Amplification Of WD40 Repeat-Containing Proteins Using Local Allignment Search Method

Degenerate primers-based polymerase chain reaction (PCR) are commonly used for isolation of unidentified gene sequences in related organisms. For designing the degenerate primers, we propose the use of local alignment search method for searching the conserved regions long enough to design an acceptable primer pair. To test this method, a WD40 repeat-containing domain protein from Beauveria bass...

متن کامل

ردیابی و تکثیر ژن های اثرگذار حامل دومین LysMدر ژنوم قارچ F. oxysporum f. sp. lycopersici

During the infection- while the xylem is colonized by the F. oxysporum f. sp. Lycopersici (Fol)- several effector proteins have been secreted into the xylem that suppress the plant’s defense response and enable parasitic colonization. So far, 14 effector proteins have been reported in Fol. However, there are no identified domains in their sequences. LysM effector proteins were identified ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002